In [3]:
import pandas as pd
import matplotlib as plt
%matplotlib inline

In [4]:
df= pd.read_csv("usbaby_NationalNames.csv")

In [5]:
df.head()


Out[5]:
Id Name Year Gender Count
0 1 Mary 1880 F 7065
1 2 Anna 1880 F 2604
2 3 Emma 1880 F 2003
3 4 Elizabeth 1880 F 1939
4 5 Minnie 1880 F 1746

In [6]:
df.tail()


Out[6]:
Id Name Year Gender Count
1825428 1825429 Zykeem 2014 M 5
1825429 1825430 Zymeer 2014 M 5
1825430 1825431 Zymiere 2014 M 5
1825431 1825432 Zyran 2014 M 5
1825432 1825433 Zyrin 2014 M 5

In [7]:
df.columns.values


Out[7]:
array(['Id', 'Name', 'Year', 'Gender', 'Count'], dtype=object)

1. what is the most common name for U.S. babies?


In [8]:
df.groupby('Gender')['Name'].describe()# top: the most common; freq: how often the most common names appear


Out[8]:
Gender        
F       count      1081683
        unique       64911
        top       Delphine
        freq           135
M       count       743750
        unique       39199
        top         Julius
        freq           135
Name: Name, dtype: object

2. What year was most babies born?


In [9]:
df['Year'].value_counts()


Out[9]:
2008    35045
2007    34931
2009    34684
2006    34069
2010    34041
2011    33869
2012    33684
2013    33203
2014    33044
2005    32533
2004    32035
2003    31173
2002    30560
2001    30261
2000    29763
1999    28544
1998    27891
1997    26965
1996    26419
1995    26080
1994    25997
1993    25957
1992    25416
1991    25104
1990    24713
1989    23767
1988    22358
1987    21395
1986    20640
1985    20075
        ...  
1909     4227
1908     4018
1907     3948
1900     3731
1905     3656
1906     3633
1904     3561
1903     3389
1902     3362
1898     3264
1901     3153
1896     3091
1895     3049
1899     3042
1897     3028
1894     2941
1892     2921
1893     2831
1890     2695
1891     2660
1888     2651
1889     2590
1886     2392
1887     2373
1884     2297
1885     2294
1882     2127
1883     2084
1880     2000
1881     1935
Name: Year, dtype: int64

3. Rank the babies names in ascending order.


In [10]:
df[['Name', 'Count']].sort_values(by='Count',ascending=True).head(5)


Out[10]:
Name Count
1825432 Zyrin 5
1001393 Kentrail 5
1001394 Kentrel 5
1001395 Kenyada 5
1001396 Kenzo 5

4. from 1980-1989, which names are most common?


In [27]:
recent=df[(df['Year'] > 1979) & (df['Year'] <1990)]
recent['Name'].describe()
#print("The most common names for babies born from 1980-1989 is Terrence")


Out[27]:
count       205714
unique       34849
top       Terrence
freq            20
Name: Name, dtype: object

5. Do baby boys outnumber baby girls in 2014?


In [35]:
df_2014=df[df['Year']==2014]
df_2014.head()


Out[35]:
Id Name Year Gender Count
1792389 1792390 Emma 2014 F 20799
1792390 1792391 Olivia 2014 F 19674
1792391 1792392 Sophia 2014 F 18490
1792392 1792393 Isabella 2014 F 16950
1792393 1792394 Ava 2014 F 15586

In [57]:
df_2014['Gender'].value_counts().plot(kind='bar')


Out[57]:
<matplotlib.axes._subplots.AxesSubplot at 0x13cb176d8>

6. What are babies names starting with T?


In [76]:
starts_with_t = df['Name'].str.startswith("T")
df[starts_with_t].head()


Out[76]:
Id Name Year Gender Count
111 112 Theresa 1880 F 153
159 160 Tillie 1880 F 83
217 218 Teresa 1880 F 50
315 316 Tennie 1880 F 26
385 386 Tena 1880 F 19

In [77]:
df['Name'].str.startswith("T").value_counts()


Out[77]:
False    1723818
True      101615
Name: Name, dtype: int64

7. Are there more boys born than girls or vice versa?


In [63]:
df['Gender'].value_counts()


Out[63]:
F    1081683
M     743750
Name: Gender, dtype: int64

In [ ]:
plt.style.use("fivethirtyeight")
df_2014=df[df['Year']==2014]
df_2014.plot(kind='barh', x='Name', y='Count', legend=False)

In [ ]: